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We explain a possible mechanism of an information diffusion on a network which spreads extraor- 
dinarily far from a seed node. On the basis of the model of the tweet diffusion on Twitter which we 
have constructed in the previous work, we will show that the correlation between the retweet rates 
■ enhances the chance of the explosive diffusion, shifting the transition point at which the diffusion 

becomes explosive. 
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A famous phenomenon on social networks in which a post by a single user collects enormous attention, or 'goes 
viral', may reflect structural and dynamical properties which we have not seen in other classical networks on the web. 
Not only that the information flow on the web became extraordinarily active in the last decade, but web services 
which people use to broadcast and receive the information have also changed greatly. It is of importance to investigate 
^C) [ what their characteristic properties are and how they affect the information flow in order to predict the behaviors. 

The classical way of the information diffusion on the web occurs owing to the access of users to the spreaders of the 
information, e.g. web- news, Wikipcdia, blogs, etc. On the other hand, a different kind of information diffusion 

' ( — i ' on the web is getting increasingly commonplace; the major examples are retweet of Twitter and share on Facebook. 
O ' The information diffusion on such social networks is qualitatively different from the former one; instead of accessing to 
I , the spreaders, users receive the information and transmit it to other users, thereby helping the information to diffuse. 

In the previous work Q , we constructed a model to describe the typical behavior of such an information diffusion 
55 \ process. In the present paper, we will focus on the situation where the diffusion becomes explosive, the diffusion which 
■ spreads to users who are extraordinarily far from a seed user. As we did in the previous work, we will consider the 
case of the tweet diffusion on Twitter as an example. The higher the fraction of the retweeters among the viewers of 
' ^ . the tweet (we call it the retweet rate) is, the wider the range of the diffusion is, which also results in a large number 
of retweets. 

, A naive description of a tweet receives many retweets would be the retweeting by a single user with a large number 
Q-f of the followers. Although it might be an important factor, even the accounts with millions of followers do not receive 
\ thousands of retweets for their daily tweets. Therefore, such a naive description does not explain the whole mechanism 
. of the explosive diffusion. The cooperation by many users is presumably crucial to spread the tweet. 
^ ' Let us define an explosive diffusion more precisely. We assume a loopless tree with a homogeneous degree distribution 
. for the underlying network and an infinite path length from a seed user. Mathematically, we define an explosive 
' diffusion as the diffusion which never stops on such a network. Then, there exists a transition point for the retweet 
rate at which the diffusion becomes explosive. Even though diffusions always die out in reality because of the loop 
structure, temporal decay of the retweet rate, and the finite path length, such a transition point is a plausible reference 
for a diffusion to be explosive. 

In the previous work, we neglected the effect of correlation between the retweet rates of the followers. Whenever 
the diffusion becomes explosive, however, we can easily imagine that the effect of correlation plays an important role. 
We will show that indeed it can largely enhance the chance of the explosive diffusion. 

This paper is organized as follows. After describing the diffusion model which we introduced in the previous work, 
we discuss the transition point of the explosive diffusion in the case of independent retweet rates. Then we show that 
the transition point is shifted owing to the correlation between the retweet rates. 

Model. In order to model the information diffusion on a social network, we classify the informed nodes by the 
distance from the seed node; see Fig.[TJ We call the nodes with the same distance a generation and discuss the diffusion 
process with respect to the generations. We denote the number of nodes in the gth generation by N g . Among Nq 
nodes which are directly connected to the seed node, some of them contribute to the diffusion and produce the nodes 
in the first generation. Assuming that the base graph is a loopless tree with a homogeneous degree distribution, we 
estimate the number N\ of the nodes in the first generation as 

iVi = AftiVo =: Jx^o, (1) 

where j3i is a positive stochastic variable which indicates the rate of nodes contributing to the diffusion among the 
Nq nodes and k is the average number of links to a node; in the case of Twitter, /?i is the retweet rate and k is the 
average number of followers. Applying this process to all generations, we obtain the following random multiplicative 
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FIG. 1: (Color online) Information diffusion on a social network. The node at the center represents the seed and the linked 
nodes can receive the information. A solid line represents that the information has diffused through the link. We ignore the 
over-counting of nodes such as the one illustrated by the wavy line; i.e., we assume a tree structure. 



process: 



N m = N [J J g = N a k m J] Pg 
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Summing up N m for all m, we obtain the total number of viewers N tot as 



(m > 1). 
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The transition point of the explosive diffusion is the point where iVtot diverges. In the present paper, we assume that 
every retweet rate f3 g obeys a common probability distribution. Especially in the case of Twitter, wc confirmed [3| 
that (3 g roughly obeys a lognormal distribution although its average and variance depend on the character of the seed 
node at g = 1. Then we set the distribution of the stochastic variable J g = k/3 g to be 



p(J g 



1 



J V2r 



exp 



1 

2^2 



(lnJg~tf 



and express J g as 
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where [i and a 1 are constant and £ is a stochastic variable which obeys a Gaussian distribution Af(0, a 2 ). 

The case of independent diffusion rate. In the following, we will consider the average number of the informed 
nodes N to t, normalized by No. In the case where the stochastic variables J g are independent of each other and all 
their averages are the same, i.e. (J g ) = (J), we have 



1-(J) 



(for (J) < 1), 



(6) 



where (■ ■ ■) stands for the statistical average. In the case of the lognormal distribution, we have (J) = exp \jJt+ ^-V 

Since J g = (3 g k, and hence (J) = (ft) k, we have the transition point f3 cx = k for the explosive diffusion. In the 
case of the Twitter network, k ~ 0(1O 2 ) and hence the transition point is /3 CX ~ O(10~ 2 ). On the other hand, in 
the case of some major news accounts such as The New York Times (@nytimes) and Reuters Top News (@Reuters), 
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(/?) ~ 0(10 5 ) which is much lower than the transition point. Because of the restriction of Twitter API Q, we cannot 
measure the value of the retweet rate j3 g of the explosive diffusion explicitly. Although the possibility of reaching the 

transition point (3 CX = k depends on the average and the variance of the retweet rate, the threshold appears to be 
too high to reach in reality if we assume that J g are independent of each other. 

The case of correlated diffusion rates. Let us now consider the quantity {N^/No) in the case where the stochastic 
variables J g arc not independent of each other; 
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where Z is the normalization factor and E 1 is the inverse matrix of the covariance matrix E^ = Wc consider 

the following matrix for E _1 : 



E- 1 







o~ r\ 

rj a- 2 



(9) 



The matrix E _1 is an infinite-dimensional matrix; we first treat it as an N x N matrix and take the limit N — > oo in 
the end. 

We can diagonalize the matrix E _1 as follows with a unitary matrix U : 



x = U£, U mn = — sin(m/c„ 



P(x) = — exp 



(10) 
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where X a = 2cosfc Q , k a = ira/(N + 1), and 3 = (N + l)/2. After this diagonalization, we have 



n j a j = c " iai j d t p (o cx p (e &j 
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where 



at = a ~ + rjXi 

m N 
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6^ = -^sin^fcj). 
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Substituting these values into Eq. (fT2]) , we obtain 

/ rn \ 



e mM exp 



\g=i 



e m>1 exp 



N 



E E 



sin gkj sin g'fcj 



^ ^ a,-(iV+l) 

JV m 

2(tv + 1) E E s 

^COS fej- ( ff -5')- cos fc i (.9 + flO J 

Let us now consider the case where e = r// o~ 2 <C 1 and analyze the expansion of aj 1 with respect to e: 

aj 1 = (7 2 (l + 2ecosfc i + o(e)). (15) 
We simply obtain (j^^^ J g ) = (J) m from the zeroth-order expansion. Including the first-order correction, we have 



(14) 



ru)=<./) m exp 

= (J) m exp 
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of/v + riE E cos kj {cos kj{g-g')- cos kj(g + g')) 
[ ' 0=1 g,g'=i 

2 N m 

o(N+ 1) E E {cos k j (g - g' + 1)+ cos kj{g-g' - 1) 



cos fcj (5 + 5' + 1) — cos fcj (p + 5' — 1)) 



After some algebra, we obtain 

Hence, the total number of the informed nodes normalized by No reads 

'N tot \ 1 , / TT T \ , . (J) 
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Therefore, the transition point for the explosive diffusion /3 ex now reads 



(16) 
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Let us next write down the transition point in terms of the correlation coefficient of the retweet rates instead of the 
off-diagonal element e = ri/a~ 2 of the matrix £ _1 . The matrix S _1 which contains the off-diagonal element e is an 
inverse matrix of the covariance matrix £ of £ g , which is related to that of J g by Eq. ([5]). Expressing the inverse of 
the covariance matrix as E _1 = <j~ 2 Fn, the covariance matrix £ reads 



IdetFjvl 



detFAr_! edet-Fjv-2 e 2 detFAr_ 3 
e det Fjv-2 det F\ det Fn-2 e det fjv-3 
e 2 det Fat-3 e det F/v_ 3 det F 2 det Fj 



N-3 ' ' ' 



(20) 



where the subscript of the matrix F/v denotes the number of dimensions. Noting that the determinant of F g has the 
following recursion relation, 



det F„ = det - e 2 det F„_ 2 , 



in the limit where N — > 00, we have 
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(21) 
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Considering the fact that r needs to satisfy r n < oo(n — > oo), we have 



1 - VI - 4e 2 

v = 

2e 2 

Hence, the off-diagonal element e is related to the covariance of £ g as 

e = r?/^" 2 = r- 2 a- 2 (^ g+1 ) . 
The covariance of £ 9 is written in terms of the covariance of J g according to Eq. §5§ using Wick's theorem: 

(J, J,) - (Jj) ( J,) = e^+w (( e «'+0) - <e*<) (e^» 



(23) 
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Therefore, we have 



e Mi+^' e iK+<^)( e (&£i) _ i) 
(Ji)(Ji>(e<^>-l). 



e = r- 2 cr- 2 ln 



{Jo) (Jg+l) ' 

Substituting Eq. (|26|) into Eq. (|19|) , we have the shift of the threshold of the transition point /3 ex as 



(25) 
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--i ('(I3 g )(l3 g+1 )y 
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Since we assumed that j3 g obeys a common lognormal distribution, we have 



/^cx ■ — k 



l + 0(e 2 ) 



-, -l+0(e 2 ) 



(28) 



where /c(/3 g , /? fl +i) is a correlation coefficient which varies from — f to f and V{f3 g ) is the variance of [3 g . Ignoring the 
factor 0(e 2 ) at the exponent, the behavior of Eq. ([2"5]) is exemplified in Fig.[5J If V(f3 g )/ ((3 g ) 2 ~ O(I), the transition 
point would be lowered onhy up to a half of the case of the independent process, while it is lowered significantly in 
the case where V((3 g )/ ((3 g ) > O(10 2 ); even when p((3 gi j3 g+ i) = 0.2, the diffusion is about twenty times more likely 
to be explosive than the independent case. 

Discussion and Conclusion. When we discuss the explosive diffusion, the average of the retweet rate is not the 
only significant factor, but its fluctuation and the correlation may also play important roles. Equation (|28p means 
that the transition point where the diffusion becomes explosive is shifted owing to the correlation p(/3 s ,/3 g+ i) of the 
retweet rates between the generations. The larger the variance V(f3 g ) of the retweet rate is compared to the square 
of its average (f3 g ), the easier it is to make the diffusion explosive. On the other hand, it is hopeless to expect the 
information diffusion with very narrow variance of the retweet rate to be explosive, unless it is constantly very close 

to the transition point k 

We defined the transition point of the explosive diffusion as a theoretical reference of the information diffusion 
on a social network such that the information reaches the nodes which are extraordinarily far from the seed node. 
We showed how the correlation between the nodes enhance the chance of the explosive diffusion. Although we 
used an approximation with respect to the off-diagonal matrix element e in Eq. (|I5p. its higher-order expansion is 
straightforward. Note that e cannot be too large, in other words, p((3 g , fi g +i) cannot be close to one, in order to retain 
the positivity of the covariance matrix S, which also validates the perturbation expansion. 
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FIG. 2: (Color online) The dependence of the transition point of the explosive diffusion, Eq. (|28p . as a function of the 
correlation coefficient p{/3 g , Pg+i)- We set k = 1 and we ignored the factor 0(e 2 ) at the exponent. 



The situation that we imagine for the explosive diffusion is the diffusion of postings with funny jokes, poetic writings, 
news which are not broadcasted on other mass media, etc. For Twitter, the transition point would be unrealistically 
far to reach without the correlation between the generations. The significant change of the transition point due to the 
correlation seems to be essential in understanding the mechanism why such postings sometimes diffuse explosively. 

The transition point may still be far to reach even after taking into account the correlation effect, presumably 
because we assumed that the underlying network has a loopless structure with a homogeneous distribution and has 
infinite path length. In order to analyze the diffusion more precisely, removing these assumptions is an interesting 
future problem. The loop correction would raise the transition point and the inhomogeneity would describe the effect 
of the complex diffusion path. Although the average path lengths are usually very short for many networks in real 
world @ , the path length of the diffusion can be much longer than the average path length of the underlying network, 
because the diffusions do not always occur along the shortest paths. 
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